Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove allocations in ErorrFacts #75234

Closed
wants to merge 1 commit into from
Closed

Conversation

jaredpar
Copy link
Member

When profiling our unit tests I discovered that ErrorFacts are a significant source of string allocations. In Emit for example it can be over 200MB of allocations.

This is due to our use of Enum.ToString as well as concat that value with well known constants. To avoid these allocations I changed our code generator to produce these strings at compile time.

I was a bit concerned about the performance of the switch compared to the existing code so I wrote a quick benchmark to demonstrate that there is no perf lost here.

Method Runtime Mean Ratio Gen0 Allocated Alloc Ratio
EnumToString .NET 8.0 66.80 us 1.00 - 45046 B 1.00
EnumToStringHelper .NET 8.0 15.76 us 0.24 - - 0.00
EnumToString .NET Framework 4.7.2 512.30 us 7.69 21.4844 140197 B 3.11
EnumToStringHelper .NET Framework 4.7.2 17.70 us 0.27 - - 0.00

When profiling our unit tests I discovered that `ErrorFacts` are a
significant source of `string` allocations. In `Emit` for example it can
be over 200MB of allocations.

This is due to our use of `Enum.ToString` as well as concat that value
with well known constants. To avoid these allocations I changed our code
generator to produce these strings at compile time.

I was a bit concerned about the performance of the `switch` compared to
the existing code so I wrote a quick benchmark to demonstrate that there
is no perf lost here.

| Method             | Runtime              | Mean      | Ratio | Gen0    | Allocated | Alloc Ratio |
|------------------- |--------------------- |----------:|------:|--------:|----------:|------------:|
| EnumToString       | .NET 8.0             |  66.80 us |  1.00 |       - |   45046 B |        1.00 |
| EnumToStringHelper | .NET 8.0             |  15.76 us |  0.24 |       - |         - |        0.00 |
| EnumToString       | .NET Framework 4.7.2 | 512.30 us |  7.69 | 21.4844 |  140197 B |        3.11 |
| EnumToStringHelper | .NET Framework 4.7.2 |  17.70 us |  0.27 |       - |         - |        0.00 |
@jaredpar jaredpar requested review from a team as code owners September 25, 2024 15:57
@dotnet-issue-labeler dotnet-issue-labeler bot added Area-Compilers untriaged Issues and PRs which have not yet been triaged by a lead labels Sep 25, 2024
@jaredpar
Copy link
Member Author

@dotnet/roslyn-compiler PTAL

@@ -8129,7 +8129,7 @@ void binaryOperator(string op, string leftType, string rightType, string expecte

if (expectedDiagnostics.Length == 0)
{
CompileAndVerify(comp, verify: Verification.FailsPEVerify);
CompileAndVerify(comp, verify: Verification.FailsPEVerify, emitOptions: EmitOptions.Default);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

emitOptions: EmitOptions.Default

What effect does this change have?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a different perf fix I was working on that I accidentally included with this change. I'll revert this part

@cston
Copy link
Member

cston commented Sep 25, 2024

            case ErrorCode.ERR_NoMetadataFile: return "ERR_NoMetadataFile_Title";

Do we need a title string for errors?


Refers to: src/Compilers/CSharp/Portable/Generated/ErrorFacts.Generated.cs:2626 in cbcb2e5. [](commit_id = cbcb2e5, deletion_comment = False)

@cston
Copy link
Member

cston commented Sep 25, 2024

            case ErrorCode.ERR_NoMetadataFile: return "ERR_NoMetadataFile_Description";

Do we need a description string for error codes other than warnings?


Refers to: src/Compilers/CSharp/Portable/Generated/ErrorFacts.Generated.cs:4509 in cbcb2e5. [](commit_id = cbcb2e5, deletion_comment = False)

@AlekseyTs
Copy link
Contributor

    public static string ToStringWithDescription(ErrorCode code)

I am not sure what is the common pattern for using this API, but it looks like our resources contain only 20 ids like that and all of them are for warnings.


Refers to: src/Compilers/CSharp/Portable/Generated/ErrorFacts.Generated.cs:4157 in cbcb2e5. [](commit_id = cbcb2e5, deletion_comment = False)

@AlekseyTs
Copy link
Contributor

    public static string ToStringWithTitle(ErrorCode code)

Similar to ToStringWithDescription, I think that only warnings have title ids in our resources.


Refers to: src/Compilers/CSharp/Portable/Generated/ErrorFacts.Generated.cs:2274 in cbcb2e5. [](commit_id = cbcb2e5, deletion_comment = False)

@AlekseyTs
Copy link
Contributor

Do we understand why tests are requesting the strings? Would it be possible to decrease the allocation just by making changes to the test infrastructure?

@AlekseyTs
Copy link
Contributor

If we go with the change, we probably should adjust the following comment at the end of ErrorCode.cs:

        // Note: you will need to do the following after adding errors:
        //  1) Update ErrorFacts.IsBuildOnlyDiagnostic (src/Compilers/CSharp/Portable/Errors/ErrorFacts.cs)

        // Note: you will need to do the following after adding warnings:
        //  1) Re-generate compiler code (eng\generate-compiler-code.cmd).
        //  2) Update ErrorFacts.IsBuildOnlyDiagnostic (src/Compilers/CSharp/Portable/Errors/ErrorFacts.cs)

@AlekseyTs
Copy link
Contributor

I am not sure that we are optimizing for a common real-world scenario, but we are definitely paying with more memory usage by the compiler (string heap, code size, etc.) for all scenarios.

@jaredpar
Copy link
Member Author

@AlekseyTs

Do we understand why tests are requesting the strings?

The primary method that causes issues is ErrorFacts.GetMessage. That is called indirectly in a lot of places in our test infrastructure. There are a few central places like DiagnosticsInfo.AssertExpectedMessageArgumentsLength but hard to change those to not realize the string in some way.

Would it be possible to decrease the allocation just by making changes to the test infrastructure?

The places I've dug into need to realize the string in some way to do their job.

I am not sure that we are optimizing for a common real-world scenario, but we are definitely paying with more memory usage by the compiler (string heap, code size, etc.) for all scenarios.

I agree that is a consideration we should make. The ErrorCode.ToString() path does seem to be both a real world and test concern. Creating diagnostics is common, and often loads of diagnostics, and today that has unnecessary allocations.

The WithTitle and WithDescription allocations are much smaller comparatively. There are a few test assemblies where those show up more prominently but overall they're not as critical to our test allocations as ErrorFacts.ToString. If we're concerned about expanding the string heap that much could definitely remove those.

@jaredpar
Copy link
Member Author

I am not sure that we are optimizing for a common real-world scenario, but we are definitely paying with more memory usage by the compiler (string heap, code size, etc.) for all scenarios.

I dug into this a bit more and wrote a program to dump the UserString heap size and then did before / after comparisons.

Configuration Size (bytes)
Existing Behavior 316,768
ToString generated 316,812
ToString, Description and Title Generated 585,432

Looking at that I agree that the price we're paying for ToStringWithTitle and ToStringWithDescription is too high and that should be backed out. The change to take ToString seems reasonable at a glance to me.

@jaredpar
Copy link
Member Author

I dug into this some more.

The reason that there was not a significant increase in the user string heap here is that because the majority of these strings already existed there. This is because of our .resx generation pass where these values, in a longer form, were placed on the user string heap already. That meant this was a roughly net neutral change for the ToString case.

However the method that is generated by this code is extremely large and significantly increases the size of the resulting binary. The size increase simply isn't worth it given this problem is largely about testing, not product scenarios. As such abandoning this approach for now. May revisit later with a more targeted approach.

@jaredpar jaredpar closed this Oct 16, 2024
@jaredpar jaredpar deleted the test-alloc5 branch October 16, 2024 21:16
@RikkiGibson
Copy link
Contributor

It would be nice if the ErrorCode member names were laid out contiguously in some way we could rely on and we could literally index into them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Compilers untriaged Issues and PRs which have not yet been triaged by a lead
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants